Observations & variables

Quantitative Methodology (UPF)

Jordi Mas Elias

https://www.jordimas.cat/

Summary

  • Terminology
  • Observations
  • Variables
  • Tidy data
  • Pivoting

Terminology

Table

It s a generic name. It can be almost anything.

  • Periodic table
  • Multiplication table
  • Truth table
  • Chi squared table

Spreadsheet

How Excel stores data in two dimensions:

Dataframe

A way to store data in R in two dimensions:

# A tibble: 17,548 × 9
   scode country      year polity2 xrreg xrcomp xropen xconst parreg
   <chr> <chr>       <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 AFG   Afghanistan  1800      -6     3      1      1      1      3
 2 AFG   Afghanistan  1801      -6     3      1      1      1      3
 3 AFG   Afghanistan  1802      -6     3      1      1      1      3
 4 AFG   Afghanistan  1803      -6     3      1      1      1      3
 5 AFG   Afghanistan  1804      -6     3      1      1      1      3
 6 AFG   Afghanistan  1805      -6     3      1      1      1      3
 7 AFG   Afghanistan  1806      -6     3      1      1      1      3
 8 AFG   Afghanistan  1807      -6     3      1      1      1      3
 9 AFG   Afghanistan  1808      -6     3      1      1      1      3
10 AFG   Afghanistan  1809      -6     3      1      1      1      3
# … with 17,538 more rows

Observations

What is an observation?

The thing that we want to know about.

  • Unit of analysis:
  • Unit of observation:

Examples:

  • States
  • Bombings
  • Ethnic groups
  • Terrorist rups ètnics, atemptats terroristes.

Types of dataset:

  • Monàdiques
  • Diàdiques.

Problema unitat d’anàlisi:

  • Singer
  • Fal·làcia ecològica: “Els rics són menys corruptes”.

Variables

What is a variable?

Types of variables (I): Nominal

Types of variables (I): Ordinal

Types of variables (III): Interval

Types of variables (III): Ratio

Recoding variables

Tidy data

Wickham rules

We consider a dataframe as tidy if it fulfills the following requirements (Wickham 2014):

  • Each dataframe has one unit of observation.
  • Observations are represented in the rows.
  • Variables are represented in the columns.
  • Each cell indicates a value.

Pivoting dataframes

What does it mean?

Canviem la unitat d’anàlisi, les observacions, i les variables del marc de dades, mantenint la mateixa informació.

  • Pivot longer
  • Pivot wider

Pivot longer

Pivot wider

Wickham, Hadley. 2014. Tidy Data.” Journal of Statistical Software 50 (10): 1–23.